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Abstract 

The principle of complementarity is quantified in two ways: by a universal uncertainty relation 
valid for arbitrary joint estimates of any two observables from a given measurement setup, and by 
a general uncertainty relation valid for the optimal estimates of the same two observables when the 
state of the system prior to measurement is known. A formula is given for the optimal estimate of 
any given observable, based on arbitrary measurement data and prior information about the state 
of the system, which generalises and provides a more robust interpretation of previous formulas 
for "local expectations" and "weak values" of quantum observables. As an example, the canonical 
joint measurement of position X and momentum P corresponds to measuring the commuting 
operators Xj = X + X' , Pj = P — P' , where the primed variables refer to an auxilary system 
in a minimum-uncertainty state. It is well known that AXj APj > h. Here it is shown that 
given the same physical experimental setup, and knowledge of the system density operator prior to 
measurement, one can make improved joint estimates X op t and P op t of X and P. These improved 
estimates are not only statistically closer to X and P: they satisfy AX op t AP op t > ft/4, where 
equality can be achieved in certain cases. Thus one can do up to four times better than the 
standard lower bound (where the latter corresponds to the limit of no prior information). Other 
applications include the heterodyne detection of orthogonal quadratures of a single-mode optical 
field, and joint measurements based on Einstein-Podolsky-Rosen correlations. 

PACS numbers: 03.65.Ta 
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I. INTRODUCTION 



At least four generic types of uncertainty principle can be distinguished in quantum 
theory: 

(i) State preparation: the quantum description of a physical system cannot simultaneously 
assign definite values to all observables; 

(ii) Overlap: different physical states cannot in general be unambiguously distinguished by 
measurement; 

(iii) Disturbance: measurement of one observable necessarily "disturbs" other observables; 

(iv) Complementarity: the experimental arrangements for accurately defining/measuring 
different observables are in general physically incompatible. 

These principles are all negative in content, corresponding to limits on what is possible 
in quantum mechanics. These limits are quantified via associated uncertainty relations. As 
the literature on such uncertainty relations is extensive, only a few general remarks and 
indicative references will be given here to set the context for this paper. 

The "state preparation" uncertainty principle is the best known, and places limitations 
on classical notions of prior knowledge (and hence predictability). The corresponding uncer- 
tainty relations are generally expressed in terms of the spreads of the probability distributions 
of different observables, the prototypical example being the textbook inequality 

AXAP>h/2 (1) 

for the rms spreads of position and momentum. 

The "overlap" uncertainty principle corresponds to the existence of non-orthogonal states, 
and underlies the semi-classical notion that quantum states occupy a phase space area of at 
least 2irh. It also separates quantum parameter estimation from its classical counterpart. A 
corresponding uncertainty relation is the parameter estimation bound 

5XAP>h/2 (2) 

where 8X is a measure of the error in any (covariant) estimate of the amount by which 
a state has been displaced in position, and AP is the rms momentum spread of the state 
[1, 2, 3]. 

The "disturbance" uncertainty principle is connected to early statements by Heisenberg 
such as 'every subsequent observation of the position will alter the momentum by an un- 
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known and undeterminable amount' [4]. Investigation of this principle has proceeded by 
examining the distribution of one observable both before and after the measurement of 
another observable, and attempting to relate the disturbance of the distribution to the ac- 
curacy of the measurement [5, 6, 7]. However, recent work by Ozawa [8, 9] shows that the 
momentum disturbance i](P) due to a position measurement having inaccuracy e(X) can 
in fact satisfy e(X)r)(P) = 0. Hence this principle needs to be formulated more carefully, 
presumably in relation to valid uncertainty relations such as [8, 9] 



Finally, the fourth uncertainty principle above arises from Bohr's notion of complemen- 
tarity [10], and restricts the degree to which joint information about observables can be 
obtained from a single experimental setup. However, previous formulations of correspond- 
ing uncertainty relations have only been given for special cases [6, 11, 12, 13, 14, 15, 16, 17]. 
The most general of these are the Arthurs-Kelly type [6, 11, 13, 17], restricted to "univer- 
sally unbiased" joint measurements; and the Martens-deMuynck type [14, 16], restricted to 
"non-ideal" joint measurements. For example, if a measurement apparatus simultaneously 
outputs two values Xj and Pj, that are on average equal to the averages of X and P (for 
all input states), then [6, 11, 13] 



The need to find general uncertainty relations quantifying complementarity, not subject to 
any restrictions on measurement, forms the subject of this paper. 

To proceed, one first clearly needs to generalise what is meant by a "joint measurement" . 
For example, neither "universally unbiased" or "non-ideal" joint measurements include ex- 
periments that are adapted in some way to particular subclasses of states. Yet Bohr defended 
complementarity against a number of thought experiments of this type [10], including the 
famous Einstein- Podolsky- Rosen (EPR) paradox [10, 18, 19]. In the latter case a joint mea- 
surement of X and P arises via simultaneous measurement of X and P', where P' refers to 
the momentum of a (correlated) auxilary system. Such a joint measurement does not satisfy 
either of the "universal unbiasedness" or "non-ideal" restrictions mentioned above. 

Indeed, in trying to place fundamental limits on the information which can simultaneously 
be gained about two complementary observables, one must consider any and all experimental 
setups, without restriction. The simplest and most general possible approach will therefore 



e(X) 77(F) + e(X)AP + AX 77(F) > h/2. 



(3) 



AXjAPj > h. 



(4) 
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be taken in this paper: any measurement is considered to provide a joint measurement of 
any two observables. The corresponding logic is that (i) the result of a given measurement 
provides information; (ii) this information can be used to make an estimate of any given 
observable; and (hi) one may look for universal uncertainty relations associated with such 
estimates. 

This approach solves the problem of what constitutes a joint measurement in a very gen- 
eral way (admeasurements are permitted). However, there are still two possible strategies 
that may be followed to obtain general joint-measurement uncertainty relations. The first 
of these is simply to seek uncertainty relations which hold for any estimates, good or bad, of 
the observables. The second strategy is to throw away all the bad estimates, and only seek 
uncertainty relations for estimates that make the best possible use of any prior information 
(after all, why make a particular estimate if the information is available to make a better 
one?). Both strategies will be followed in this paper, and corresponding joint-measurement 
uncertainty relations are given in Sees. Ill and IV. 

Note that the strategy of making the best use of any available prior information is of some 
interest in its own right, quite aside from joint measurements. For example, a measurement 
of position does not by itself provide a very useful basis for estimating energy. However, 
combining the measurement result with any information available about the system before 
measurement (eg, its average momentum, or its quantum state, or its entanglement with 
an auxilary system) can lead to a significantly improved estimate. More generally, prior 
information helps the experimenter place the detector to minimise null outcomes, and the 
quantum communications engineer to optimise the receiver. As emphasised by Trifonov et 
al. [17], even the "universally unbiased" bound in Eq. (4) is achieved only by choosing the 
experimental setup in dependence on prior information about the system to be measured 
(the "balance" parameter b = hAX/AP in [11], and the full polarisation state in [17]). 

In Sec. II a general formula is given for the best possible estimate of an observable, based 
on an arbitrary measurement and prior information about the state of the system. This 
formula is related to and generalises expressions for "local expectation values" [20, 21, 22] 
and "weak values" [23, 24] of quantum observables. The best possible estimate is also 
determined for the case in which there is no prior information available. Examples are given 
for general energy estimation, and for estimation of the quadratures of a single-mode field 
using optical heterodyne detection. In the latter case the best possible estimates are related 
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to the gradient of the Husimi Q-function. 

In Sec. Ill a geometric uncertainty relation is given for the optimal estimates of any 
two observables from arbitrary measurement data, assuming that the state prior to mea- 
surement is known. This uncertainty relation implies a trade-off between the dispersions of 
the estimates (i.e., the spreads of the corresponding distributions), and the inaccuracies of 
the estimates (i.e., the degree to which the estimates successfully mimic the corresponding 
observables). A universal lower bound for the inaccuracy of any (possibly non-optimal) esti- 
mate is also given. For the case of heterodyne detection two further inequalities are derived, 
applying to the dispersions and to the inaccuracies of the estimates respectively. It is also 
shown that the optimal estimates resulting from a canonical joint measurement of position 
and momentum, on a known state, satisfy an uncertainty relation with a lower bound 1/4 
of that in Eq. (4). 

In Sec. IV a universal joint-measurement uncertainty relation is derived, valid for any 
estimates (optimal or otherwise) of two observables from a given experimental setup. The 
derivation shares a formal link with Ozawa's proof of Eq. (3) [8, 9], and modification of 
the derivation leads to stronger uncertainty relations such as Eq. (4) for the special case 
of universally unbiased measurements. Results are applied to a discussion of the above- 
mentioned EPR paradox [18], and to quadrature estimation based on prior information 
about the averages of certain observables. 

Some conclusions are given in Sec. V. 

II. MAKING THE BEST POSSIBLE ESTIMATE 

Consider an arbitrary measurement M. having possible results {m}, and with statistics 
given by 

p(m\p) = tr[pM m ] (5) 

for a system described by density operator p. Since the probabilities must be positive and 
sum to unity, the operators {M m } must be positive and sum to the unit operator, and 
hence form a probability operator measure (POM) [1, 2, 25]. In the interests of generality, 
no further restrictions or specific measurement models are assumed. 

A notation is adopted whereby a measurement, its corresponding POM, and the corre- 
sponding observable quantity, are all denoted by the same scripted character, eg, M.. Any 
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Hermitian operators associated with the measurement will be denoted via related upper-case 
Roman characters, eg, M m . 

In some cases M. may be equivalently described by a Hermitian operator M. In such a 
case M m is just the projection onto the eigenspace associated with eigenvalue m of M, and 
M = ^Z m vaM m . If M is non-degenerate with eigenkets {|m)}, and the system is described 
by a pure state p = Eq. (5) reduces to the familiar expression p{m\ip) = \{m\ip)\ 2 . 

As is well known, however, there are many non-trivial measurements that are not equivalent 
to some Hermitian operator acting on the Hilbert space of the system [1, 2, 25]. 

As discussed in the Introduction, it may often be desirable to make an estimate of some 
observable based on the result of a given measurement and any available prior information. 
This Section is therefore concerned with answering the following question: for a quantum 
system described by density operator p, what is the best possible estimate one can make of 
some observable, A, from a measurement of M. with result ml 

A. Using prior information: a special case 

It is convenient, for the purpose of introducing the necessary concepts, to first consider 
the above question in the special case that M. and A correspond to respective Hermitian 
operators M and A. This case was also considered briefly in Ref. [26]. 

In particular, suppose that one seeks the best possible estimate of A from the measure- 
ment of a Hermitian operator M having eigenkets where for simplicity it will be 
assumed that the system is in a known pure state \ip) prior to measurement. It follows that 
any estimate f{m) of A from measurement result M = m is equivalent to measurement of 
the Hermitian operator f(M) = J2 m f(m)\m)(m\. One may therefore represent the estimate 
as 

f(M)=A + N f , 

i.e., as the sum of the operator to be estimated, A, and a "noise operator", Nf [6, 8, 13]. 

Now, the best possible estimate will of course depend on the criterion of optimality used 
to define "best possible". One obvious criterion is that the noise should be "small" on 
average, i.e., the quantity 

= (WW - Af\^) = D4f(M), Af (6) 
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should be small. Here D^(A,B) := ((A — B) 2 ) 1 / 2 denotes the statistical deviation between 
Hermitian operators A and B (see also Appendix). The best possible estimate is therefore 
defined as corresponding to the choice of f that minimises the statistical deviation between 
the observable and its estimate. 

To determine this best possible estimate, note that Eq. (6) can be rewritten as 

D4f(M),A) 2 = (A 2 }+J2\H^\ 2 f(m) 2 -J2f(rn)mm}(m\A\^} + cx.^ 

m 

(m\A\ib 



Re 



f(m) — Re 



(m\ip) 

m\A\tp) 



(7) 



(m\ip) 

Only the last term depends on the estimate, and is nonnegative. Hence the minimum 
possible statistical deviation or "noise" corresponds to the choice [26] 

f(m) = A opt (m\i>) := Re . (8) 

A tilde is used to distinguish this quantity from an operator. 

Thus, when statistical deviation is used as the criterion of optimality, the optimal estimate 
of A, from measurement result M = m on state \ip), is given by A opt (m\ip) in Eq. (8). It is 
only possible to make this estimate when the appropriate prior information - the state prior 
to measurement - is known. The case where no prior information is available is considered 
in Sec. II. C below. 

The formula on the righthand side of Eq. (8) has in fact appeared previously in the 
literature in a variety of other contexts: as the "local expectation value" of the operator 
A relative to M for state [20, 21, 22]; as the "weak value" of the operator A relative 
to pre-selected state \ip) and post-selected state \m) [23, 24, 27, 28]; and as the "classical 
component" of A with respect to M for state [26, 29, 30]. However, only the above 
"estimation" context appears to provide a robust interpretation. 

For example, the expression in Eq. (8) can be negative for a positive operator A, which 
undermines its interpretation as either a "value" or a "classical" component of A. In contrast, 
the fact that the best possible estimate of some positive observable, from the measurement 
of a second incompatible observable, can be negative on occasion merely provides a nice 
signature of the difference between quantum and classical estimation theory [31] (at least in 
the case where statistical deviation is used as the sole criterion of optimality). While one 



could of course restrict attention to estimates that fall within the eigenvalue range of A, 
the estimate in Eq. (8) still remains of fundamental interest in providing an absolute lower 
bound for the statistical deviation of any estimate. It should be noted that, in any case, all 
examples considered in this paper satisfy this restriction (with the exception of Eq. (16) in 
Sec. II.D). 



B. Using prior information: the general case 

The question posed at the beginning of this section, of how to determine the best possible 
estimate of an observable A from measurement of a general POM observable M. on a system 
described by a known density operator p, may now be addressed. Clearly, it is first necessary 
to suitably generalise in some way the criterion of optimality used in the previous section. 

For the case where A corresponds to some Hermitian operator A, the generalisation of 
statistical deviation turns out to be quite straightforward. In particular, as discussed in the 
Appendix, the natural definition of the statistical deviation between a Hermitian operator 
A and a POM observable B = {B b }, for a given state p, is 

D P (A, Bf := 5>[(;4 - b)p(A - b)B b \. (9) 

b 

Note that this reduces to D^(A, B) in Eq. (6) when B corresponds to some Hermitian 
operator B. As shown in the Appendix, the derivation in Eq. (7) is easily generalised to 
give the optimal estimate 

f( x a f \ \ tr[p(M m A + AM m )] 

f(m) = A opt (m\ P ) := (10) 

of A from measurement result Ai = m. The case where A does not correspond to a Hermitian 
operator is also discussed in the Appendix. 

Eq. (10) clearly generalises Eq. (8), and has several properties worth noting. First, it 
follows via Eq. (5) that the optimal estimate is always unbiased, i.e., 

]Tp(m|p) A opt (m\p) = tr[pA] = (A). (11) 

m 

Second, if the system is initially in some eigenstate \a) of A, then A opt {m\p) = a, indepen- 
dently of the actual measurement result. Third, if A4 corresponds to an ideal measurement 
of a Hermitian operator which commutes with A, one has the classical repeatability property 

A opt (m\p) = A opt (m\p), (12) 



where p := J2 m ' M m ipM m i describes the post-measurement ensemble. 

It is convenient to denote the physical observable associated with the optimal estimate 
in Eq. (10) by A opt . Measurement of A opt is carried out by measuring A4, and for result 
m attributing the outcome A opt {m\p) to A opt . One may refer to A opt as the compatible 
component of A with respect to M.. Note from Eq. (10) that compatible components form 
a linear algebra, i.e., 

(A + Ai3) opt = Apt + Ai3 opt . 

C. No prior information 

Consider now the case where there is no information available about the state of the 
system prior to measurement. The statistical deviation therefore cannot be calculated, nor 
the estimates in Eqs. (8) and (10). The best possible estimate of A from some measurement 
M. must instead be defined via some state- independent criterion of optimality. 

One suitable criterion is provided by a generalisation of the Hilbert-Schmidt distance 
d(A,B) 2 = tr[(A — B) 2 ] between two Hermitian operators. Such a generalisation, d(A, M), 
for a Hermitian operator A and a POM observable Ai, has recently been given [32]. The 
estimate "closest" to A, in the sense of minimising this distance, follows directly as (see 
Appendix) 

A° opt (m) :=tr[AM m ]/tr[M m ]. (13) 

The physical observable corresponding to this estimate will be denoted by A Q opt . Note from 
Eq. (10) that A Q opt = A op t in the case that p is a maximally mixed state, i.e., p ~ 1. 

The estimate in Eq. (13) is typically biased - after all, there is no prior information 
available about (A) to feed into such an estimate. However, depending on the relationship 
between A and M., it is possible for the estimate to be universally unbiased, as will be seen 
for heterodyne detection in Sec. II. D below. Further, in cases where the estimate is only 
linearly biased, it is possible to trade "distance" for "bias" . For example, if 

£ M m ti[AM m ]/ti[M m } = A + r (14) 

m 

for some constant r, then a universally unbiased estimate is obtained by replacing A° opt (m) 
with A° opt (m) -r. 
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As a more general example, consider the estimate of the spin, S = ha/2, of a spin-1/2 
particle, from a measurement result m corresponding to a general POM {M m = g m (l + 
er.m)}. Here m ranges over some subset R of the Bloch ball, and {q m } is any probability 
distribution on R satisfying J2m Qm m — 0. The best possible estimate of S from result 
m follows from Eq. (13) as the linearly biased estimate hm/2. The associated universally 
unbiased estimate is fcA _1 m/2, where A denotes the matrix ^ m ?mitim T (note that the 
inverse exists providing R contains three linearly independent members). A similar result 
holds on general Hilbert spaces for trace-class M m and A, with the components of a replaced 
by a linearly independent basis set of trace-free Hermitian operators. 

D. Example: energy estimation 

Making the best possible estimate of energy from the measurement of various observables 
is considered here, to indicate the types of expressions that can arise. 

First, for a particle with Hamiltonian operator H, consider the case where all that is 
known about the system is that it is in thermal equilibrium corresponding to temperature 
T. The particle is therefore described by the density operator proportional to e _l3H , where 
(3 = l/(kT). It follows from Eq. (10) that the optimal estimate of the energy of the system, 
from measurement result M. = m, is given by 

E opt (m\T) = -(d/d(3) lntr[e-^M m ]. (15) 

Thus tr[e~^ H M m ] is a kind of generalised partition function. For the particular example of 
a position measurement, on a 1-dimensional harmonic oscillator of mass m and frequency 
u>, one obtains the quadratic estimate 

E opt (x\T) = A T + B T x 2 , 

where A T = (1 / 2)hu coth((3hu) and B T = (l/2)mtu 2 sech 2 ((3huj/2). Note that in the zero- 
temperature limit this estimate reduces to the groundstate energy hu/2, independently of 
the actual measurement result x. In the classical limit h — > the estimate reduces to 
(l/2)kT + {l/2)mui 2 x 2 , i.e., to the sum of the average kinetic energy and the potential 
energy (this result holds more generally). 

Second, for a particle with Hamiltonian H = P 2 /(2m) + V(X) in a known pure state 
the best possible estimate of energy from a measurement of position X follows from Eq. (8) 
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as 

£ opt (a#) = |VS| 2 /(2m) + V(x) + Q(x), (16) 

where {x\ip) = Rexp(iS/h), and Q(x) = —h 2 /(2m)V 2 R/R is the so-called "quantum po- 
tential" [21]. Note that Q(x) arises here in the context of the best possible estimate of the 
kinetic energy [i.e., the possibly negative quantity \VS\ 2 /(2m) + Q(x)\, with no relation to 
a real potential energy. 

Third and finally, consider a single-mode optical field with annihilation operator a and 
Hamiltonian H = houcra. An inefficient measurement of photon number, via a photodetector 
having quantum efficiency 77, corresponds to the POM {M m (i])} with number-state expansion 
M m (r)) = J2 r \m + r){m + r\ m+r C r r] m (l —rf) r [16, 25]. If there is no prior information about 
the state of the field prior to measurement, the best possible estimate of the energy of the 
field then follows from Eq. (13) as 

4%(m) = M(™ + l)A7-l], (17) 

using the identity ^Z r m+r C r x r = (1 — x)~ m ~ x . This estimate is linearly biased [with r = 
I/77 — 1 in Eq. (14)], with the associated univerally unbiased estimate given by %uim/r]. 

E. Example: heterodyne detection 

For a single-mode optical field with annihilation operator a, the quadrature observables 
Xi = (a + a')/2, X2 = (a — cr)/2i have commutator [JTi, X2] = i/2, and hence are analogous 
to the position and momentum observables of a quantum particle (with h replaced by 1/2). 
In particular, X 1 and X 2 cannot be measured simultaneously to an arbitrary accuracy. 

However, in optical heterodyne detection, one introduces an auxilary imageband field 
with annihilation operator b, and simultaneously measures the real and imaginary parts of 
the operator a + [33, 34, 35], i.e., one measures the commuting observables 

X ltJ = X 1 + Y 1 , X 2yJ = X 2 -Y 2 , (18) 

where Y\ and Y 2 denote the corresponding quadratures of the imageband field. This may be 
interpreted as corresponding to an approximate joint measurement of X\ and X 2 , subject 
to imageband noise. 
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Clearly this joint measurement is formally equivalent to the canonical joint measure- 
ment of position X and momentum P of a quantum particle, referred to in the Abstract, 
where one introduces an auxilary particle with corresponding observables X' and P', and 
simultaneously measures 

Xj = X + X', P J = P-P>. (19) 

This formal equivalence allows one to map results from one context to the other. 

For the general case of an uncorrelated imageband field described by density operator pi, 
the measurement statistics of heterodyne detection correspond to a continuous POM {M a }, 
with M a = 7T^ 1 D(a)p' i D(a) j; [36]. Here a denotes the complex eigenvalue a± + ia 2 of a + b', 
D(a) denotes the Glauber displacement operator exp(ad ][ — a*a), and p\ is defined by 

Pi ■= E \m)aa(n\(-l) m+n b (m\ Pi \n)* b , 

m,n 

where the subscripts a and b refer to number states of the signal and imageband fields 
respectively. 

For simplicity, attention will be restricted in what follows to the case of a vacuum-state 
imageband field. For this case p\ = pi = |0)(0|, and hence the measurement is described by 
the well known coherent-state POM [1, 2, 25, 35], with 

M a = 7i~ 1 \a)(a\, 

and associated measurement statistics given by the Husimi Q-function 

Q(a) = a' 1 {a\p\a) . (20) 

Now, suppose first that there is no prior information available about the state of the field. 
The best possible estimate of the quadrature Xi, from measurement result a, then follows 
from Eq. (13) as the estimate 

X lopt (a) = (a | Xi | a) /{a | a) = a±. (21) 

Similarly, the best possible estimate of X 2 in the case of no prior information is given by 

^2°opt(«) = « 2 . (22) 

Thus the best possible estimates are directly given by the measurement result a, i.e., they 
are equivalent to measurement of X ltJ and X 2t j in Eq. (18). More generally, the best possible 
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estimate of a general Hermitian observable f(a,a ! ), when no prior information is available, 
follows from Eq. (13) as f^ n \a,a*), where denotes the normally-ordered form of /. 

The situation changes markedly when prior information about the state of the system 
is available. In particular, the best possible estimate of Xi for a measurement on a known 
state p follows via Eq. (10) as 

^i,o P t(tt|p) = (a\X 1 p + pX 1 \a)/{a\p\a)/2 

= - Re {a + (a\ap\a) / (a\p\a)} . (23) 
2 

Thus the direct "no prior information" estimate, a\ = Re a in Eq. (21), provides only half 
of the input to the more general estimate of X x . The other half depends on the state, and 
is typically a highly nonlinear function of both ot\ and a 2 . One has a similar estimate 

X 2 , pt(a\p) = ^ Im {a + (a\ap\a) / (a\p\a)} (24) 

for the quadrature X 2 , where again the "no prior information" estimate, a 2 = Ima, provides 
only half the input. 

Further insight into these best possible estimates is gained by expressing them solely in 
terms of the Husimi Q-function Q(a) in Eq. (20). In particular, noting that variation with 
respect to a gives 

5{a\p\a) = (a\D(5a)^ pD(5a)\a) — (a\p\a) 

= {a\[p,a^]\a) 5a + (a\[a : p]\a) 5a*, (25) 

one may replace ap by [a, p] + pa in Eqs. (23) and (24) to obtain 

^•,opt(«|p) = aj + (l/4)(d/d aj ) logQ (26) 

for j = 1,2. Hence the best possible estimates differ significantly from ai and a 2 precisely 
when the gradient of the logarithm of the probability distribution, at the point corresponding 
to the measurement outcome, is large. 

As examples, consider the cases where the field is known to be in a coherent state 
and in a number state \n). One then finds from Eqs. (23) and (24), or equivalently from 
Eq. (26), 

X 1<opt (a\f3) + iX 2 , op t(a\P) = ^(a + (3), 
Xi ;0pt (a|n) + iX 2 , op t(a\ri) = ^a (1 + n/\a\ 2 ) 
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respectively. 

Finally, to preview the effect of prior information on joint-measurement uncertainty rela- 
tions, the uncertainties of the estimates X l opt and X 2t0p t will be calculated here for the above 
coherent-state example. These estimates are equivalent to the measurement of (X^j + (3)/ 2 
and (X 2t j + P)/2 respectively, and hence, using Eq. (18), 

VaxX 1>opt = (1/4) VaxXi,j = (VarXx + VarYi)/4 = 1/8, 

with a similar result for VarA^.opt- One therefore obtains the uncertainty product 

A* 1)0pt A* 2)0pt = l/8 (27) 

for this example, which is four times better than the corresponding product, 

AX hJ AX 2J = 1/2, 

for the case when no prior information about the state is available. It will be shown in the 
following section that this factor of 4 improvement is the ultimate limit. 



III. UNCERTAINTY RELATIONS FOR OPTIMAL ESTIMATES 
A. Dispersion vs inaccuracy: a geometric uncertainty relation 

There are two types of contribution to the "uncertainty" of an estimate. The first, 
dispersion, is related to the statistics of the estimate itself, whereas the second, inaccuracy, 
is related to how well the estimate does its job of estimating a given observable. These two 
types of uncertainty are to some degree complementary, and it will be seen that for optimal 
estimates they are linked by a very simple uncertainty relation. 

To characterise dispersion, let Af denote the observable corresponding to a general es- 
timate of A from measurement M., where outcome m of M. corresponds to outcome f(m) 
of Af. The statistics of the estimate are completely determined by the statistics of M and 
the choice of /, and in particular the root mean square deviation of Af may be calculated 
in the usual way as 

1 2 



(AAf) 2 = Y,PMp)f( 



m) 2 



J2p( m \p) f( m ) 



(28) 
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where the outcome probability p(m\p) is given by Eq. (5). This quantity will be used as a 
measure of the dispersion of the estimate. 

To characterise the inaccuracy of the estimate Af, one requires a measure e(Af) of the 
degree to which the estimate differs from the observable being estimated. In particular, it 
should be nonnegative, and vanish in the case of a perfect estimate (i.e., Af = A). The 
statistical deviation used in Eq. (9) satisfies these properties, and hence the quantity 

e(A f ):=D p (A,A f ) (29) 

will be used as a measure of inaccuracy of the estimate. Note from Eq. (6) that, for Hermitian 
observables, this measure is just the mean deviation of the noise operator associated with the 
estimate [6, 8, 13]. Note also that the optimal estimates of Sec. II based on prior information 
about the state of the system are precisely those estimates having the minimum possible 
inaccuracy: e(Af) > e(A> P t)- 

It follows immediately from Eq. (A3) of the Appendix that 

(AA) 2 = (AA opt ) 2 + e(A opt ) 2 , (30) 

i.e., the dispersion and inaccuracy of the best possible estimate form the sides of a right- 
angled triangle having hypotenuse AA. Thus there is a fundamental tradeoff between dis- 
persion and inaccuracy, valid for any measurement M. . This tradeoff may be geometrically 
represented by the constraint that v4. op t lies on a circle (or hypersphere) having diametrically 
opposed "poles" A and (A). These poles correspond to the optimal estimates for M. = A 
(i.e., a perfect estimate) and M. = 1 (i.e., a trivial estimate) respectively. Alternatively, 
one may represent the tradeoff by a circle of radius AA in the dispersion-inaccuracy plane, 
with zero inaccuracy and zero dispersion corresponding to the cases M. = A and Ai = 1 
respectively. 

The above geometric property, and the standard uncertainty relation AAAB > 
| {[A, B]) |/2, allows one to immediately write down a general uncertainty relation for the 
best possible estimates of two Hermitian operators A and B from an arbitrary POM mea- 
surement M.: 

"(AApt) 2 + e(Apt) 2 

Thus, for a non-zero lower bound, one cannot make both estimates arbitrarily accurate while 
making the corresponding dispersions arbitrarily small, no matter what measurement scheme 
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(A£ opt ) 2 



e(B opt y > \ ([A,B])\/2. 



(31) 



is adopted. Note that the lower bound is achieved if and only if the system is in a minimum- 
uncertainty state of A and B. 



B. Incompatibility implies inaccuracy 

One has the useful lower bound 

e{Aopt) -^ (32) 

for the inaccuracy of the best possible estimate. Equality holds in the case that p is pure 
and M. is complete (i.e., with M m = \m)(m\ for all m), and hence in particular for the case 
of heterodyne detection with pure signal and imageband fields. Note that since the optimal 
estimate of A has, by definition, the best possible accuracy, the righthand side of Eq. (32) 
in fact provides a lower bound for the inaccuracy of any estimate of A from A4, and hence 
is universal. 

The lower bound is non-trivial whenever ([A, M m ]) does not vanish for some m, i.e., 
whenever A and M. are incompatible for state p. Hence, one can never make a perfect 
estimate of one observable from the measurement of a second incompatible observable. When 
A and M. are a pair of canonically conjugate observables, the lower bound is proportional to 
the Fisher information of A4, and the case of equality corresponds to an "exact uncertainty 
relation" for A and M [26, 29]. 

Eq. (32) generalises Eq. (47) of Ref. [26] (in the context of exact uncertainty relations) 
and Eq. (14) in Ref. [28] (in the context of weak values), to general POM measurements M. 
It follows via the Schwarz inequality |tr[^L] | 2 < tr[i^iT]tr[L'''L], which gives 

|tr [pAM m ] | 2 < ti[pM m ]ti[pAM m A] 

for the choice K = p 1 / 2 ]^ 2 , L = p 1 / 2 AM^ 2 . Noting the first equality in Eq. (A3) of the 
Appendix, and using (z + z*) 2 = 4:\z\ 2 — \z — z*\ 2 for z = tr[pAM m ] appearing in the optimal 
estimate in Eq. (10), then leads directly to Eq. (32). Equality holds for K proportional to 
L, and hence in particular for a complete measurement on a pure state. 
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C. Example: heterodyne detection 



For heterodyne detection with a vacuum-state imageband field, as discussed in Sec. II. E, 
it will be shown that one has the further independent inequalities 

A* 1)0pt A* 2)0pt > 1/8, (33) 
e(X hopt ) 2 + e(X 2 , opt ) 2 > 1/4, (34) 

for the dispersions and the inaccuracies of the best possible estimates. The first relation is 
saturated for coherent states, and the second relation is saturated for all pure states. 

Note that for the analogous case of a canonical joint measurement of position and mo- 
mentum as discussed in Sec. II. D (with the auxilary system in a minimum uncertainty state), 
it follows immediately from Eq. (33) that one has the corresponding uncertainty relation 

AX opt AV opt >h/4, (35) 

improving on the "universally unbiased" lower bound in Eq. (4) by a factor of 4. Thus, even 
when one has complete information about the system prior to measurement, there is still a 
fundamental lower bound to the product of the dispersions of the optimal estimates. 

To prove Eqs. (33) and (34), recall that the 2x2 covariance matrix C for two random 
variables Ai, A 2 is given by Cj k :— (AjA k ) — (Aj)(A k ). Hence the covariance matrix C opt 
of the optimal quadrature estimates follows via Eqs. (21), (22) and (26) as 

C ^ = {a > ak) + \l d2a [ a ^ k +ak ^) + h** 

= C% + (l/16)i$ - (1/2)5,,. (36) 

Here C Q is the covariance matrix for the joint-quadrature observables X^j and X 2y j in 
Eq. (18), F Q denotes the Fisher information matrix of the Husimi Q-function with [37] 

Ff k -.= J <PaQ{dlogQ/d aj )(dlogQ/da k ), (37) 

and integration by parts has been used to obtain the second line. 

Now, if Fj denotes the Fisher information of the marginal distribution Qj{a.j) for aij, then 
the Cramer- Rao inequality from classical statistics [37] yields Fj > 1/C®. One also has 

< J d 2 aQ(a) [(dlogQ/daj) - (dlogQj/daj)} 2 = Fj} - F y 
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Substitution of these inequalities into Eq. (36) then yields 

C°f > Cg- + l/(16Cg) - 1/2. 

Writing VarXx = 7r/4, VarX 2 = 7/(4r), with 7 > 1 (to satisfy the standard uncertainty 
relation for the quadratures) and r > 0, and noting from Eq. (18) that = VarX,- + 1/4, 
therefore leads to 

_j i_ 

7 + r 7r + 1 

Minimising the righthand side with respect to r gives r = 1; minimising the resulting 
expression with respect to 7 > 1 then gives 7=1; and Eq. (33) immediately follows. 

Finally, to obtain Eq. (34), note first that combining Eqs. (20), (25), (32) and (37) gives 

e(* li0pt ) 2 > Fg/16, e(* 2 , opt ) 2 > Fg/16, (38) 

where equality holds for all pure states. Thus the accuracy of the estimate of one quadrature 
is related to the Fisher information of the other quadrature. Moreover, taking the trace of 
C opt in Eq. (36) and using the Euclidean relation in Eq. (30), one also has 

e(*i l0p t) 2 + e(* 2 , opt ) 2 = 1/2 - (Fg + F 2 Q 2 )/16 

(giving an upper bound of half a photon for the lefthand side). Comparison with Eq. (38) 
immediately yields the known relation [37] 

Fg + Fg < 4 (39) 

for the trace of the Fisher information matrix, which when inserted back into the previous 
expression yields Eq. (34) as desired. 

IV. UNIVERSAL JOINT-MEASUREMENT UNCERTAINTY RELATION 
A. Arbitrary estimates 

The uncertainty relation to be derived in this section applies to any estimates of two Her- 
mitian operators A and B from a general measurement M.. Unlike the geometric uncertainty 
relation of the previous section, it is valid for both optimal and non-optimal estimates, and 
is independent of whether or not any prior information about the system is available. The 
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associated derivation may be modified to obtain the more restrictive uncertainty relations 
satisfied by universally unbiased estimates, such as Eq. (4). 

Suppose then that f(m) and g{m) are general estimates for A and B respectively, for 
measurement result M. = m. These estimates thus correspond to two compatible observables 
Af and B g , measured by measuring Ai and for outcome m assigning the values f(m) and 
g(m) respectively. It will be shown that these estimates satisfy the universal uncertainty 
relation 

AA f e(B g )+e(A f )AB g + e(A f )e(B g ) > \([A,B\)\/2. (40) 

This uncertainty relation is therefore a fundamental expression of the limitations imposed 
by complementarity on quantum systems. 

As a very simple example, suppose that one makes no physical measurement at all, but 
simply estimates A = and B = on every occasion. Then clearly the dispersions of the 
estimates vanish: AAj = AB g =0. The universal uncertainty relation Eq. (40) then implies 
that the product of the inaccuracies of such trivial estimates is non-trivially bounded below, 
i.e., 

e(A f )e(B 9 )>\([A,B})\/2. 

As a less trivial example, suppose that the position X of a quantum particle is measured, 
and used to estimate both the position and the momentum of the particle. It is natural to 
choose Xf = X (this is in fact the optimal estimate, whether or not any prior information is 
available). This estimate of X is perfectly accurate, i.e., e(Af) = 0, and hence from Eq. (40) 

AXe(P 9 ) > h/2 

for any corresponding estimate V g of the momentum. Note that this is a stronger result 
than the corresponding geometric uncertainty relation following from Eq. (31). 

The proof of Eq. (40) proceeds via a formal trick - the representation of the measure- 
ment M as a Hermitian operator M' on an extended Hilbert space. This representation (a 
Naimark extension) preserves the statistical deviation between observables, while allowing 
one to exploit algebraic properties of Hermitian operators. Any such representation can be 
used for the proof, however, the choice of a product space representation is perhaps the 
simplest. 

In particular, for a given POM M. = {M m } one can always (formally) introduce an 
auxilary system described by some fixed state p', and a Hermitian operator M' acting on 
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the tensor product of the system and auxilary system Hilbert spaces, such that the statistics 
of M and M' are identical [1, 2, 25], i.e., 

p(m\p) = tr[pM m ] =ti\p® p'M'J (41) 

for all density operators p and outcomes m, where M' m denotes the projection on the 
eigenspace associated with eigenvalue m of M' . Note that this representation is used here 
as a formal mathematical device only, with no physical content. 

It follows immediately from Eq. (41) that the statistics of general estimates Af and B g 
are equivalent to the statistics of the (commuting) Hermitian operators f(M') and g(M') 
respectively. Further, if {\s')} denotes a complete set of kets for the auxilary Hilbert space, 
Eq. (41) yields the partial trace relation 

tr pl [p'M' m ] := YtWM'JJ) = M m . (42) 

s' 

Hence, using Eqs. (Al) and (A2) of the Appendix, one has 

D p ^(AJ(M')f = (A 2 ) + (f(M'f) -£/(m)tr[p® p'(AM' m + M' m A)} 

m 

= (A 2 ) + (A)) - £ f(m) {tr p [pAt^[p'M' m \\ + c.c.} 

m 

= (A 2 ) + (Aj) - tr[pAA f + A f Ap] 
= D p (A,Af) 2 = e(Af) 2 

and thus the representation preserves statistical deviation and inaccuracy. Writing 5 A = 
A - f(M'), 5B = B- g(M'), it follows that e(A f ) 2 = ({SA) 2 ) and e(B g ) 2 = ((SB) 2 ), and 
hence that 

\([A,B])\ = \([f(M') + SA,g(M')+SB})\ 

< \([f(M'),5B])\ + \([5A,g(M')])\ + \([5A,5B])\ 

< 2Af(M')e(B g ) + 2e(A f )Ag(M>) + 2e(A f )e(B g ), 

using the triangle inequality and the Schwarz inequality ((K — k) 2 )(L 2 ) > \ ([K, L}) | 2 /4 (in 
a manner formally similar to Ozawa's proof of Eq. (3) [8, 9]). The last line is equivalent to 
the universal uncertainty relation in Eq. (40). 

Finally, the above derivation may be modified to obtain a stronger uncertainty relation, 
valid for the special case of universally unbiased estimates of A and B [6, 11, 13]. In 
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particular, the requirements that (A/) = (A), (B g ) = (B) for all states p implies via Eq. (42) 
that 

A = tT p/ [p'f(M% B = ti pl [p'g(M')]. 

Hence tr p ,\p' Ag{M')} = AB = tr p > [p' f(M')B)], implying that ([8 A, SB]) = (-[A,B]). Thus, 
with no triangle inequality being necessary, the Schwarz inequality yields 

e(A f )e(B g ) > \([5A,5B])\/2 = \([A, B])\/2. (43) 

The joint uncertainty relation for universally unbiased joint measurements of position and 
momentum, Eq. (4), is a straightforward consequence of this result [6, 11, 13]. 

B. Example: EPR estimates 

The notion that the properties of position and motion are incompatible goes back nearly 
2500 years to Zeno of Elea (who resolved the issue by concluding that motion was impossible). 
However, in classical physics this notion was rejected due to the existence of a consistent 
model: one can simultaneously define both the position and motion of a classical system 
by assuming that it follows a (differentiable) continuous trajectory in configuration space. 
Unfortunately, in the standard quantum formalism there are no such trajectories for physical 
systems, and a new resolution of the issue is needed. 

In the standard interpretation of quantum mechanics, as formulated by Heisenberg and 
Bohr [4, 10], one takes the view that the properties of position and motion are indeed incom- 
patible, in the sense of being unable to be accurately defined/measured simultaneously, and 
to this extent agrees with Zeno. However, others (most notably Einstein) have argued that 
the quantum formalism is in fact incomplete, and that quantum systems can in particular 
have simultaneously well-defined physical values of position and momentum [18]. It has 
since been shown that any such "hidden variable" interpretation requires the existence of a 
mutual influence or conspiracy between a measurement made on one system and the values 
ascribed to a space-like separated system [38, 39, 40]. Even so, it is of interest to consider 
the relation of the famous incompleteness argument made by Einstein, Podolsky and Rosen 
(EPR) [18] to the principle of complementarity, as embodied in Eq. (40). 

The EPR paper considers two particles described by an eigenket of relative position and 
total momentum [18]. Clearly, the position of the first particle can be estimated precisely by 
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a direct measurement of the position, with perfect accuracy: e(X opt ) = 0. Simultaneously, 
the correlation between the particles allows the momentum of the first particle to also be 
estimated precisely, from a measurement of the momentum of the second particle, again 
with perfect accuracy: e(V op t) = 0. At first sight it thus appears that the universal joint- 
measurement uncertainty relation in Eq. (40) is violated by the EPR example. 

To see what is happening, it is helpful to replace the non-normalisable eigenket considered 
by EPR with the physical wavefunction 

where K is a normalisation constant and a, r <C 1 in suitable units. One has 

(X-X') = a, Var(X-X') = a 2 < 1, 

(P + P')=p , Var(PH-P') = r 2 < 1, 

and hence ip is an approximate eigenstate of the relative position and total momentum. 

Suppose now that X and P' are simultaneously measured as before, with measurement 
results x and p' respectively. The corresponding best possible estimates of X and P then 
follow via Eq. (8) as 

y- b h 2 {p -p') + a 2 r 2 p' 

X opt = x, P opt = -j- — — xipo-p. 

n + a z r z 

The dispersions and inaccuracies of these estimates follow from straightforward calculation 

as 

AX opt = {h 2 + o- 2 t 2 ) 1 / 2 /(2t) » h/(2r), e{X opt ) = 0, 
\h 2 — a 2 r 2 \ . . . . . Tit 

pt = 2a[tf + aV)^ 53 " /(2<T) ' £(P * ) = W + *V) " T ' 
Substitution into the lefthand side of the joint measurement uncertainty relation in Eq. (40) 
then gives h/2, which is precisely equal to the value of the righthand side - the state is in 
fact a minimum joint-uncertainty state of position and momentum (other equalities for this 
state are given in Ref. [26], where the effect of wavefunction collapse on optimal estimates 
is also considered). 

The above results support, in a quantitative manner, Bohr's defence of the consistency 
of complementarity with the completeness of the standard quantum formalism [10, 19]. 
The EPR argument in fact goes somewhat further, asserting the physical reality of the 
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estimated value of P from the measurement of P', and the simultaneous physical reality of 
the estimated value of X following from the alternative measurement of X' [18]. However, 
precisely because these measurements do not refer to a single experimental setup, such 
assertions go beyond the quantum formalism, and cannot be tested via Eq. (40). 

More generally, even when one has full knowledge of the state of some system, and uses 
this prior information to make the best possible estimates of two complementary observables 
from a given experimental setup, there remains a fundamental tradeoff between dispersion 
and inaccuracy - embodied by the universal uncertainty relation in Eq. (40) - which prevents 
simultaneous knowledge of the corresponding physical properties. 

C. Example: linear estimates 

It is of interest to consider an example where one does not know the state of the system 
before measurement, but does have prior knowledge of the averages of one or more observ- 
ables. While such prior information is by itself insufficient to make an optimal estimate as 
per Eq. (10), it can still be taken into account to improve on the "no information" estimate 
of Eq. (13). 

One method of proceeding might be to introduce some physical principle to assign a 
unique state to the system that is consistent with the given prior information, and to cal- 
culate estimates by substituting this state for p in Eq. (10). For example, the maximum 
entropy principle of Jaynes could be used for this purpose [41] (indeed the "thermodynamic" 
example in Eq. (15) may be reinterpreted in this way, where the form of the density operator 
corresponds to the maximum entropy state consistent with a known prior average energy of 
the system [41]). 

In general, however, there are many possible physical states consistent with given prior 
knowledge about certain averages. Further, the available prior information may well im- 
ply, for example, that the system is not described by a maximum entropy state (eg, in a 
communication setup it may be known that each signal is described by one of a number of 
fixed pure states j^i), |^ 2 ), • • • having equal average energies). It is therefore important to 
consider estimation methods that use only the prior information that is available, without 
requiring assumptions about the actual state of the system. Here linear estimates and their 
joint uncertainty properties will be examined. 
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Consider first a detection system for a classical signal s, which is subject to uncorrelated 
noise n, resulting in a measured signal m = s + n. It will be assumed that (n) — 0. If m is 
taken as an estimate for s, the average deviation of this estimate from the actual signal s is 
quantified by 

e 2 = ( (m - s) 2 ) = N, 

where N denotes the noise variance (n 2 ). 

However, one can do better if there is some prior information about the signal statistics. 
For example, suppose one knows the average value s = (s) and the variance S = ((s — s) 2 ) of 
the signal. Then it is straightforward to show that the linear estimate m\ in = \m+ (1 — A)s 
has a minimum statistical deviation from the signal s given by 

4. = ( Kn - s) 2 ) = NS/(S + N)< e 2 , 

corresponding to the choice A = S/ (S + N). The associated rms uncertainty of this estimate 
follows as 

Am lin = S/(S + N) 1 ' 2 = (1 + iV/^) - 1 Am. 

Thus, use of the prior information allows not only a better estimate of the signal, but also a 
reduction in the dispersion of the estimate of the signal. Note that for the particular case 
of Gaussian signal and noise distributions, the above linear estimate is in fact optimal over 
any other estimate [42]. 

Consider now the canonical joint measurement of position and momentum for a quantum 
particle as previously discussed, corresponding to measurement of the commuting operators 
Xj = X + X', Pj — P — P', where the primed variables refer to an auxilary particle in a 
minimum uncertainty state with (X f ) = (P') = 0. It will be assumed that all that is known 
about the particle prior to measurement are the means and variances of X and P. 

The observables X, X', Xj = X+X' all commute, and are therefore completely analogous 
to the respective classical variables s, n, and m = s + n. It immediately follows that the 
best linear estimate of X from Xj, given knowledge of (X) and VarX, is equivalent to 
measurement of the operator X iin = XX j + (1 — A)(X), with A = (1 + VarX'/VarX) -1 ; 
associated inaccuracy 

e(Xi in ) = AX AX'/ (VarX + VarX') 1/2 ; 
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and associated dispersion 

AX lin = VarX/(VarX + VarX') 1/2 = (1 + VarX'/VarX^AXj. 

One similarly has an optimal linear estimate P un obtained from knowledge of (P) and VarP, 
with analogous expressions for e(P un ) and AP lin . 

Note that there is a degree of freedom remaining, which may be tuned for further optimal- 
ity In particular, the squeezing ratio AX'/AP' may be chosen to minimise some suitable 
cost function. For example, for a harmonic oscillator one might choose to minimise the "in- 
accuracy energy" e(P un ) 2 /(2m) + (mw 2 /2)e(X un ) 2 . However, the existence of the universal 
uncertainty relation in Eq. (40) suggests the more generic "joint uncertainty" cost function 

J = AX lin e(Pi in ) + e(X lin ) AP lin + e(X lin ) e(P lin ). 

Minimising J with respect to the squeezing ratio leads to two regimes. First, if AX AP < 
2h, then it is optimal to choose AX' / AP' = AX/ AP, which leads to the inequality 

AX lin AP lin > [1 + ^ 2 /(4VarX VarP)]" 1 h/2 > h/4, 

analogous to the lower bound in Eq. (35). However, for AX AP > 2h, it is optimal to choose 
either of AX' and AP' equal to zero, corresponding to the alternatives X lin = X, P lin = (P) 
and Xiin = (X), P un = P respectively - i.e., not to bother with a true joint measurement 
at all! A similar dichotomy of regimes has been noted previously for the special case of 
Gaussian states [43]. 

V. CONCLUSIONS 

A general formula for the best possible estimate of one observable from the measure- 
ment of another has been given, and applied in a number of settings. A universal joint- 
measurement uncertainty relation has also been given, which quantifies the principle of 
quantum complementarity for all possible experimental setups. Describing measurements 
by completely general POMs (which require only that probabilities are positive and sum to 
unity), implies that the main results of the paper are universally applicable, and indepen- 
dent of any dynamical models and interpretational issues concerning quantum measurement. 
It is also worth noting that the use of a general POM includes the case where an experi- 
menter bases an estimate on the results of a plurality of measurements, obtained by carrying 
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out a number of (predetermined) consecutive physical operations (described by "completely 
positive" linear maps [25]). 

It has been shown that by using prior information about the system (eg, the state of 
the system in Sec. III.C and the mean and variance of certain observables in Sec. IV.C) 
one can improve the standard uncertainty relation for the canonical joint measurement 
of position and momentum by up to a factor of 4. However, unlike the classical case, 
if one makes optimal use of complete information about the system before measurement, 
one cannot do any better than this - complementarity cannot be circumvented by the use of 
prior knowledge. The principle of complementarity is similarly consistent with respect to the 
properties of entangled systems - as demonstrated in Sec. IV. B, quantum correlations cannot 
be exploited to violate the universal joint-measurement uncertainty relation of Eq. (40). 

Finally, it would be of interest to determine the best possible estimate of an observable 
under the imposition of further natural restrictions. For example, one could require that an 
estimate of photon number, from some general measurement, minimise statistical deviation 
subject to the further constraint of being a positive integer. This would reduce the accuracy 
of the estimate relative to the unconstrained case, but has the advantage of incorporating 
prior information about the possible physical values of the observable being estimated. It 
would similarly be of interest to consider alternative characterisations of dispersion and 
inaccuracy (eg, entropy and relative entropy). 

Some time after this paper was submitted, a related eprint by Ozawa has appeared [45] , 
giving an independent derivation of the universal uncertainty relation in Eq. (40). 

APPENDIX A 

The proofs of Eqs. (10) and (13), for optimal estimates of a Hermitian operator A from a 
general measurement A4, are given here. The generalisation to the optimal estimate of any 
POM observable A from measurement of M. is also discussed. 

The main ingredient required is a measure of "how good" a given estimate of A is. For 
the case of two Hermitian operators A and B, a natural measure of how well one mimics 
the other, for a given state p, is given by the statistical deviation 

D p (A,B) 2 = tT[p(A-Bf}. (Al) 
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This measure was used in the proof of Eq. (8) for the special case where Ai corresponds to 
a Hermitian operator M. However, to consider arbitrary measurements Ai it is necessary 
to generalise this measure to the case where one observable is an arbitrary POM observable. 

Fortunately, the generalisation of Eq. (Al) is quite straightforward [9, 32]. In particular, 
it is natural to define the statistical deviation between a Hermitian operator A and a POM 
observable M = {M m } by 

D p (A, M) 2 = to[M m (A - m)p{A - m)\ = tv[p(A - M) 2 ] + tr[p(W - m\ (A2) 

m 

where M J := Z) m m : ' M m . This expression reduces to Eq. (Al) for Hermitian observables. 
It follows directly from a natural algebra for POM observables [32] (being the square root 
of the average of the square of the "difference" of two such obervables), and has also been 
postulated ab initio in Ref. [9]. It first appeared in the context of estimation of photon 
number from an optical phase measurement [44]. 

To obtain Eq. (10), let Af denote the observable corresponding to a general estimate of 
A from Ai, where outcome m of Ai corresponds to outcome f(m) of Af. The statistical 
deviation between Af and A follows from the first equality in Eq. (A2) as 

D(A,A f ) 2 = (A 2 )-J2f(m)tT[p(AM m + M m A)]+Y,f( m )MpMm} 

m m 

= (A 2 }-2j2f(m)A opt (m\p)ti[pM m }+J2f(m) 2 tT[pM m ] 

m m 

= (A 2 ) -]Ti opt (m|p) 2 tr[pM m ] + £[/(m) - i op t(m|p)] 2 tr[pM m ], 

m m 

where A opt (m\p) is the estimate defined in Eq. (10). The last term is nonnegative, and hence 
the statistical deviation is minimised by the choice f(m) = A opt (m\p), as per Eq. (10). Note 
that choosing Af = A op t in the above expression, and using Eq. (11), gives 

D p (A, Apt) 2 = ( A 2 ) - ( A 2 opt ) = VarA - Var^pt = D P (A, (A)) 2 - D p (A opt , (A)) 2 (A3) 

for the minimum statistical deviation. 

The proof of Eq. (13) is completely analogous, where the statistical deviation in Eq. (A2) 
is replaced by the generalised Hilbert-Schmidt distance 

d(A,M) 2 := ]Ttr[M m (A-m) 2 ] 

m 

= ti[(A-M) 2 ] +tr[(M2 -M 2 )], (A4) 
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obtained via a natural algebra for POM observables [32]. Note that this measure is propor- 
tional to the average of the square of the statistical deviation over all states. 

Finally, it may be asked whether one can define the best possible estimate when A does not 
correspond to a Hermitian operator. This is of interest, for example, if one wants to make the 
best estimate of elapsed time or optical phase from the measurement of some observable such 
as position or photon number. It turns out that the generalisation of statistical deviation 
is highly non-trivial in this case, as certain consistency conditions must be satisfied [32]. 
However, for the special case of complete observables A and M. (i.e., with A a = \a)(a\, 
M m = \m)(m\), which further satisfy the condition that no two kets from the combined set 
{|a), \m)} are proportional, it follows from Sec. 4 of [32] that the statistical deviation has 
the simple generalised form 

D P (A,M) 2 = tr\p(A? + W - A M- M A)], 

with Ai and defined as above. It may be shown that the best possible estimate of A, 
from a measurement result m of M. on a known state p, follows in this case as 

-VHP) = H^y , (A5) 
2(m\p\m) 

However, more generally one cannot simply replace A by A in Eq. (10). 
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